The Soundex Phonetic Algorithm Revisited for SMS Text Representation

نویسندگان

  • David Pinto
  • Darnes Vilariño Ayala
  • Yuridiana Alemán
  • Helena Gómez-Adorno
  • Nahun Loya
  • Héctor Jiménez-Salazar
چکیده

The growing use of information technologies such as mobile devices has had a major social and technological impact such as the growing use of Short Message Services (SMS), a communication system broadly used by cellular phone users. In 2011, it was estimated over 5.6 billion of mobile phones sending between 30 and 40 SMS at month. Hence the great importance of analyzing representation and normalization techniques for this kind of texts. In this paper we show an adaptation of the Soundex phonetic algorithm for representing SMS texts. We use the modified version of the Soundex algorithm for codifying SMS, and we evaluate the presented algorithm by measuring the similarity degree between two codified texts: one originally written in natural language, and the other one originally written in SMS “sub-language”. Our main contribution is basically an improvement of the Soundex algorithm which allows to raise the level of similarity between the texts in SMS and their corresponding text in English or Spanish language.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phonetic based SoundEx & ShapeEx algorithm for Sindhi Spell Checker System

This paper presents a novel combinational phonetic algorithm for Sindhi Language, to be used in developing Sindhi Spell Checker which has yet not been developed prior to this work. The compound textual forms and glyphs of Sindhi language presents a substantial challenge for developing Sindhi spell checker system and generating similar suggestion list for misspelled words. In order to implement ...

متن کامل

Thai-English Cross-Language Transliterated Word Retrieval using Soundex Technique

This paper presents an algorithm for Thai-English crosslanguage transliterated word retrieval. The algorithm enables retrieval of documents containing either the English keywords or the corresponding English-to-Thai transliterated words. This is done by retrieving documents based on phonetic codes of keywords rather than the keywords themselves. The phonetic coding is based on the Soundex codin...

متن کامل

A Bangla Phonetic Encoding for Better Spelling Suggestions

We present a phonetic encoding for Bangla that can be used by spelling checkers to provide better suggestions for misspelled words. The encoding is based on the Soundex algorithm, modified to match Bangla phonetics. We start by analyzing Soundex encoding scheme when applied to Bangla. &ext we propose a new encoding that handles the case of Bangla words, including those containing conjuncts. We ...

متن کامل

Soundex Algorithm for Indian Language Based on Phonetic Matching

In a system with a large database, there always has been a problem that names may not be spelled well or might not be spelled in a way that one expected. So, data in the database gets degraded. In this case it is required to search the duplicates and merge them in the single entity. In doing so, one problem is that the way in which the strings would be compared. In such cases rather than lookin...

متن کامل

Improving Precision and Recall for Soundex Retrieval

We present a phonetic algorithm that fuses existing techniques and introduces new features. This combination offers improved precision and recall.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012